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Abstract. Time series data may exhibit clustering over time and, in a multiple time series 
context, the clustering behavior may differ across the series. This paper is motivated by the 
Bayesian non— parametric modeling of the dependence between the clustering structures and 
the distributions of different time series. We follow a Dirichlet process mixture approach and 
introduce a new class of multivariate dependent Dirichlet processes (DDP). The proposed DDP 
are represented in terms of vector of stick-breaking processes with dependent weights. The 
weights are beta random vectors that determine different and dependent clustering effects along 
the dimension of the DDP vector. We discuss some theoretical properties and provide an 
efficient Monte Carlo Markov Chain algorithm for posterior computation. The effectiveness of 
the method is illustrated with a simulation study and an application to the United States and 
the European Union industrial production indexes. 
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1. Introduction 

This paper is concerned with some multivariate extensions of the Poisson-Dirichlet process. 
In t his paper t he class of m odels considered originates from the Ferguson Dirichlet process (DP) 



(see iFergusonl (|1973l LL97J)) that is now widely used in non-parametric Bay esian statistic s . Our 



extensions rely upon the so-called Sethuraman's representation of the DP. ISethuramanl ((1994) 
shows that, given a Polish space (X, X) and a probability measure Hq on X , a random probability 
measure G on X is a Dirichlet Process of precision parameter a > and base measure Hq, in 
short DP(a, Hq), if and only if it has the stick-breaking representation: 

(1) G(-) = E^M0, 

k>l 

where (tf>k)k and (Wk)k are stochastically independent, {(f>k)k is a sequence of independent random 
variables (atoms) with common distribution Hq (base measure), and the weights WkS are defined 
by the stick-breaking construction: 

(2) Wk-^SkUil-Si), 
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Sj being independent random variables with beta distribution Beta(l,a). 

The DP has been extended in many directions. In this paper, we will build on a generalization of 
the DP that is called the Poisson-Dirichlct process. A Poisson-DP, PD(a, I, Ho), with parameters 
< I < 1 and a > — I and base measure Ho, is a random probability measure that can be 
represented with a Sethuraman-like construction by taking in dU)- (Eb a se quenc e of independent 



rando m variables Sk with Su ~ Beta(l — I, a + Ik), see iPitmanl (120061 ) and 



Pitman and Yor 



(1997 ). Further general i zation s based on the stick-breaking construction of the DP can be found 



Ishwaran and Jamesl ( 2001 ) 



The DP process and its univariate extensions are now widely used in Bayesian non -parametric 



statist ics. A recent account of Bayesian non-parametric inference can be found in iHiort et al 



(|2010f ) . The univariate DP is usually emplo yed as a p rior for a mixing distribution, resulting in a 



DP mixture (DPM) model (see for example IloI (jl984l )). The DPM models incorporate DP priors 
for parameters in Bayesian hierarchical models, providing an extremely flexible class of models. 
More specifically, the DPM models define a random density by setting: 



(3) 



f(y) = f JC(y\8)G(d0) =^W k }C(y\0 k ), 

J k>l 



where K, is a suitable density kernel. Due to the av ailab ility of simple and efficie nt methods for 



posterior computation, starting from 



Escobarl ( 1994 ) and 



Escobar and Westl ( 1995 ). DPM models 



are now routinely implemented and used in many fields. 

The first aim of this paper is to introduce new class of vectors of Poisson-Dirichlet processes and 
of DPM models. Vectors of random probability measures arise naturally in generalizations of the 
DP and DPM models that accommodate dependence of observations on covariates or time. Using 
covariates, data may be divided into different groups and this leads to consider group-specific 
random probability measures and, as we shall see, to assume that the observations are partially 
exchangeable. 

Probably the fir st paper in this direction, tha t introduced vect ors of priors for partially ex- 



MacEachern (1999 



20011 ) in- 



changeable data, is ICifarelli and Regazzinil (|1978l ) . More recently, 
troduced the so-called dependent Dirichlct process (DDP) and the DDP mixture models. His 
specification of the DDP incorporates dependence on covariates through the atoms while as- 
suming fixed weights. More specifically, the atoms in the Scthuraman's representation (fT|)-([3|) 
are replaced with stochastic processes (p k {z), z being a se t of covariates. Ther e exist many 
applications of this specification of the DDP. For inst ance, 



ANOVA-type dependence structure for the atoms, while 



De Iorio et al 



Gelfand et al 



(|200J) proposed an 



20041 ) considered a spatial 



dependence structure for the atoms. Later, DDP with both dependent atoms and weights was 
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introduced in 



Griffin and Steell ([2006). Other constructio ns that inco r porat e a dependence struc 



turein the weights have been prop o sed, for instan c e, in 
pOllUPunson and Peddadal |2008l) : 



Dunson et al 



(120081) and 



Duan et al 



(120071); IChung and Dunson 



Rodriguez et al 



2010|). 



Other approaches to the definition of dependent vec tors of random mea s ures rely upon either 



suitab le convex combinations of independent DPs (e 



(120061) : 



breakings (e.g. 



Hatiispyrosa et al 



Teh et al 



(2011); 



( 20061) : 



Kolossiatis et al 



Miiller et al 



Sudderth and Jordan 



(2004); 



Pennell and Dunson 



( 2011 )) or hierarchical structures of stick- 



20091) ). 



Finally, we should note that it is possible to follow alternative routes other than the Scthu- 



ample, 


Leisen and Liioi 


Ishwaran and Zarepour 



In this paper, we introduce a new class of multivariate Poisson-DP and DP by using a vector of 
stick-breaking processes with multivariate dependent weights. In the construc tion of the dependent 



Nadaraiah and Kotz 



weigh ts, we consider the class of multivariate beta distributions introduced by 
(|2005l ) that have a tractable stochastic representation and makes the Bayesian inference procedures 
easier. We discuss some properties of the resulting multivariate DP and Poisson-DP and show 
that our process has the appealing property that the marginal distributions are DP or Poisson-DP. 

The second aim of the paper is to apply the new DP to Bayesian non-parametric inference 
and to provide a simple and efficient method for posterior computation of DDP mixture mod- 

1 the m ultivariate context the slice 
(|201ll ). The sampling methods for 



Walker 


(2007 


) and 


Kalli et al. 


(2011) 



the full conditional distributions of the resulting Gibbs sampling procedure are detailed and the 
effectiveness of the proposed algorithm is studied with a set of simulation experiments. 

Another contribution of this paper is to present an application of the proposed multivariate DDP 
mixture models to multivariate time scries modeling. In the recent years, the interest in Bayesian 
non-parametric models for t ime series has increased. In th is context, DP have been recently 
employed in different ways. 



Rodriguez and ter Horst 



2008 |) used a Dirichlet process to define 



an infinite mixture of time series models 
finite mixture of independent Dirichlet process mixtures. 
Dirichlet process mixture of stochastic volatility models and 



Taddv and Kottas J 200S ) proposed a Marko v-switching 



Jensen and Maheul (|2010h considered 



Griffin! (| 2 1 lh proposed a continuous- 



time non-parametric model for volatility. A flexible non-parametric model w ith a time-varying 
stick-breaking process has been recently proposed by iGriffin and Steell (|201ll ). In their model, a 
sequence of dependent Dirichlet processes is used for capturing time-variations in the clustering 
structure of a set of time series. In our paper, we extend the existing Bayesian non-parametric 
models for multiple time series by allowing each series to have a possible different clustering 
structure and by accounting for dependence between the series-specific clusterings. Since we obtain 



4 FEDERICO BASSETTI, ROBERTO CASARIN, AND FABRIZIO LEISEN 

a dynamic infinite-mixture model and since the number of components with negligible weights can 
be different in each series, our model represents a non-parametric alternative to multivariate 
dynamic hnitc- mixture models (e.g., Markov-switching models) that are usually employed in time 
series analysis. 

The structure of the paper is as follows. Section[2]introduces vectors of dependent stick-breaking 
processes and defines their properties. Section [3] introduces vectors of Poisson-Dirichlet processes 
for prior modelling. Section @] proposes a Monte Carlo Markov Chain (MCMC) algorithm for 
approximated inference for vector of DP mixtures. Section [5] provides some applications to both 
simulated data and to the time series of the industrial production index for the United States and 
the European Union. Section [6] concludes the paper. 

2. Dependent stick-breaking processes 

Consider a set of observations, taking values in a space Y, say a subset of R d , divided in r 
sub samples (or group of observations), that is: 

Y^j i — 1 , . . . r, j 1 , . . . , TL{ . 

Above Yij is the j-th observation within sub-sample i. For instance, i may correspond to a space 
label or predictors. Typically, one assumes that the observations of the block i have the same 
(conditional) density /, and that the observations are (conditionally) independent. Hence, to 
perform a non-parametric Bayesian analysis of the data, one needs to specify a prior distribution 
for the vector of densities (/i, f%, . . . , f r ). Moreover, in assessing a prior for (/i, fa, . . . , f r ), a 
possible interest is on borrowing information across blocks. To do this, first we introduce a 
sequence of density kernels K,\ :YxX4 [0, 1] (i = 1, . . . , r) (where JCi is jointly measurable and 
C i— > J c Ki(y\x)v{dy) defines a probability measure on Y for any x in X, v being a dominating 
measure on Y). Secondly we define: 

(4) My) := J K l {y\0)G l {d9) i = l,...,r, 

where (Gi, . . . , G r ) is a vector of dependent stick breaking processes that will be defined in the 
next section. 



2.1. Vectors of stick-breaking process es. Followin 



a general d efinition of dependent stick- 



20011 ). we let 



breaking processes, essentially proposed in iMacEachernl ((1999, 
(5) Gi{-) :=Y,W ik hA') i = l,...,r. 

k>l 

where the vectors of weights Wu = (Wik, ■ ■ ■ , W r k) and the atoms (pk = {fiik, ■ ■ ■ , <Prk) satisfy the 
following hypotheses: 

• {<fk)k an d (Wk)k are stochastically independent; 
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• (<Pk)k is an i.i.d. sequence taking values in W with common probability distribution Go; 

• (Wfe)fe are determined via the stick breaking construction 

Wik-SikHil-Sij), i = l,...,r 

j<k 

where Yll = 1 for z > j and Sk = (Sik, ■ ■ ■ , S r k) are stochastically independent random 
vectors taking values in [0, l] r such that Ylk>i ^ik = 1 a.s. for every i. 

Note that (Gi, . . . , G r ) is a vector of dependent random measures whenever (Sn, . . . , SVi) or 
((/?n, . . . , <^ r i) are vectors of dependent random variables. The dependence between the measures 
affects the dependence structure underlying the densities fi,...,f r , which can be represented as 
infinite mixtures 



(6) My) = ^2w ik lC i {y\(p ik ) 



k>l 

functions of the atoms (tpk)k an d the weights (Wk)k- 

The above definition of dependent random measures is quite general. For the sake of complete- 
ness, we shall notice that our specification of vectors of stick-breaking processes can be extended, 
even if not straightfor wardly, up to include more rich structure such as the matrix of stick-breaking 



(|2008l ) . In the rest of this section we briefly discuss the choice 
of atoms {(fk]k and analyze some general features of vectors of stick breaking processes. While in 
the next section we focus on the main contribution of this works which is a new specification of 
the stick-vectors {Sk}k based on multivariate beta distribution. 

2.2. Atoms. The simplest assumption for the atoms is that they are common to all the measures 
Gi . Otherwise stated this means that the base measure of the atoms is 

(7) G Q (A 1 xA 2 ---xA r )=H Q (f)A. l ) 

i 

for every A±, . . . , A r measurable subsets of X, Hq being a probability measure on X, which corre- 
sponds to the case 

(8) (01k, <frk) = (<P0k, VOk, <f>0k) 

with ipok distributed according to Hq. 

Eventually one can choose a more complex structure for the law of the atoms, including co- 
variates (or exogenous effects) related to the specific block i. For instance one can assume an 
ANOVA-like scheme of the form 



(9) 



(<Plk, ■ ■ ■ , firk) = (Aofe + fi-lk, P<0k + ft-2k, • • • , fj-Ok + Mrfc) 
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where jiQk represents the overall "mean" (of the fc-th mixture component) and fj,^ the spe- 
cific "mean" for fact or i (of the fc-th mixture component). A similar choice has been used in 



De Iorio et al 



(120041) 



In many situations it is reasonable to assume that the components of the mixture are essentially 
the same for all the blocks but that they have different weights. In addition, the choice ©-([H]) 
yields a simple form of the correlation between the related random measure. This feature, which 
may be useful in the parameter elicitation, is discussed in the next subsection. 

2.3. Correlation and Dependence Structure. In the general definition given in Subsection 
12.11 the vectors Sk of stick variables are assumed to be independent. If in addition one assumes 
that they have the same distribution, i.e. that 

(10) (Sk)k>i is a sequence of i.i.d. random vectors, 

then it is easy to compute the correlation of two elements of the vector (Gi, . . . ,G r ) as well as the 
correlation between fi(y) and fj(y)- 

For the shake of simplicity we shall consider only the case i = 1 , j = 2 and set 



E[5 n S 21 ] (2E[Su] - E[g 1 2 1 ])(2E[^ 21 ] - E[S 2 21 }) 

[ > 1 ' 2 • l-E[(l-Sn)(l-S ai )]V nSfMSl] 

and, for every y, 

K G 01 (y) ■= J £2(2/ 1 x)G i{dx), K Go2 (y) ■■= J 1C 2 (y \ x)G 02 {dx), 
K Go (y) := J K.\(y \ Xi)JC 2 (y \ x 2 )G (dxi x dx 2 x X r ~ 2 ) 
where Got denotes the i-th marginal of Go- 

Proposition 1. Assume that (|10p holds true, then for all measurable set A and B 

G (AxBx X'- 2 ) - G i(A)G 02 (S) 



(12) Cor(G 1 (A),G 2 (B)) = C 1 , 2 x 



VGoi(A)(l - G 01 (A))G 02 (B)(1 - G 02 (B)) 
and for every y in Y 

(13) Cor(f 1 (y),f 2 (y)) =G 1 , 2 x ~ ^MjM 

V^GoilJ/Jl 1 - K G„ 1 (y))HG 02 (y)(i - K Go2 (y)) 

The proof of Proposition [1] is in Appendix. 
Corollary 2. Assume that (jHJ and (|10p hold true, then for every measurable set A 

(14) CoridiA)^^)) = C h2 . 
where C\_ 2 is defined in (fTT|) . 
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2.4. Partial exchangeability. Wc conclude this section by observing that [Yy 



l,...r,j > 1] 



is a partially exchangeable random array, indeed the joint law of the infinite process of observation 
is characterized by: 



(15) P{Yii G Aj i = 1, • • .r,j = 1, . . . , nj = E 



n 

i=l 



n 



K,i(yij\xi)y(dyij)Gi(dxi) 



where the expectation is respect to the joint law of (G\, . . . ,G r ). Recall that an array [Yy : i = 
1 , . . . r, j > 1] is said to be row- wise partially exchangeable if, for every n > 1, every measurable 
sets Aij and any permutations pi, . . . , p r of {1, . . . , n}, 



P{Y i:j G i = 1, . . . r,j = 1, . . . , n} = P{Y ipi{j) G Aj i = 1, . . . r, j = 1, 



In other words, the joint law is not necessarily invariant to permutations of observations from 
different groups. From a practical point of view, the partial exchangeability represents a suitable 
model for sets of data that exhibit sub-samples with some possibly different features. 



3. Beta-Product Poisson-Dirichlet Processes 

Wc propose a new class of vector of dependent probability measures (Gi, . . . , G r ) in such a way 
that (marginally) Gi is a Dirichlet process for every i. This result follows from the Sethurman's 
representation (JTJ) if one considers a multivariate distribution for (Sik, ■ • ■ , 5Vfc) such that Sik ~ 
Beta(l, a,), Vi, k. 

It is worth no ticing that the r e are man y possible definitions o f mul tivariate beta distribution 
(see for example I Olkin and Liul (|2003l ) and lNadarajah and Kota (|2005h ). but not all of them has 
a tractable stoc hastic representation and lea ds to simple Bayesian inference procedures. For this 



reason we follow 



Nadaraiah and Kota (|2005l ) and consider a suitable product of independent beta 



random variables. More specifically we apply the following result. 



Proposition 3 (jRadhakrishna Raol (| 19491) ) . If U\, U2, ■ ■ ■ , U p are independent beta random vari- 
ables with shape parameters (aj, bi), i = 1,2, ... ,p and if a.;+i = a; + hi, i = 1, 2, . . . ,p — 1, then 
the product U\Ui ■ ■ ■ U p is also a beta random variable with parameters {a%, b\ + ■ ■ ■ + b p ). 

Proof. It is easy to check that the Mellin transform of a beta random variable Z of parameters 
(a, b) is given by 



M z (s) :=E[Z S 



r(a + b)r{a + s) 



T(a)T(a + b + s) 

Hence, since a-i — ai + bi and using the independence assumption 

KA < ^ - 1HT7T s 11fiTrr s l - F ( Ql ± bl ) r ( ai ± ^ T ^ 2 ± ± S J. - F(fll ± &1 ± b ^ T( - ai + gj 

M UlU2 (s) - HU 1 \HU 2 \ - r(oi)r(oi +bl + s) T (a 2 )T(a 2 + b 2 + s) " r(ax)r( 0l +h + b 2 + s) ' 
Which gives the result for p = 2. The general case follows. □ 
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We obtain two alternative specifications of the multidimensional beta variables. Specifically, if 
we set 

(16) {S lk , S 2k , ■ ■ ■ , S rk ) := {V ok V lk ,V ok V 2k , V ok V rk ) 

with V ok ,...,V rk independent, V ik ~ Beta(a ok , a lk ) (i = 1,2, ...,r) and V ok ~ Beta(a ok + 
ai k ,a 2k ), then S ik ~ Beta(a ok ,ai k + a 2k ). 
As an alternative we consider 

(17) (S lk , S 2k , . . . , S rk ) := (V ok V lk . . . K-ife, V ok V lk . . . V r _ 2k , . . . , V ok ) 

with Vbfc, . . . , T^-ifc independent and V lk — Beta{a Qk + • • • + a ik ,a l+ i k ), i = 0, . . . ,r — 1, that 

gives - Beta(a ok ,a lk -\ h a r +i-a)- 

It should b e note d that (fl6|) resembles the specification of the matrix stick-breaking process in 



Dunson et al 



(|2008l ). In that paper all the components of the stick-breaking process are products 
of two independent beta variables with fixed parameters (1, a) and (1, (3), that precludes obtaining 
Poisson-Dirichlet marginals. For this reason we propose the specifications in (fT6)) and (fl7)) that, 
for a special choice of the parameters cti k , allow for Poisson-Dirichlet marginals. In the first 
construction we obtain a random vector with identical marginal distributions, while in the second 
construction the vector has different marginals. Moreover, in the second case Si k < S 2k < ■ ■ ■ < 
S rk , which induces an ordering on the concentration parameters of the Dirichlet marginals. These 
aspects will be further discussed in the following sections. 

3.1. Dirichlet Process marginal. For the sake of clarity, we start with r = 2. We assume 
(flTJ| and we discuss how to choose the parameters in (flTJ)) - ([T7|) in order to get DP marginals. 
Note that since (fTT)|) holds true, then S k = {Si k ,S 2k ) ~ (Sn,S 2 i). According to the construction 
schemes given in (|16[) and (|17p . with a$ k — 1, ot\ k = a±, a 2k = a 2 , two possible and alternative 
specifications of (Sii,52i) are: 

(HI) (Sn,S 2 i) := (VqVi, VqV 2 ), with Vo,Vi,V 2 independent, Vq ~ Beta(\ +ai,a 2 ) and Vi ~ 

Beta(l, ati), i = 1,2, where a\ > and a 2 > 0; 
(H2) (Sii, 5*21) := (VoVi,Vo)j with Vq,V\ independent, Vq ~ Beta(l,ai) and V\ ~ Beta{\ + 

ai,a 2 ) with a± > and a 2 > 0. 

Thanks to Proposition |3l if (HI) holds, then Sn ~ Beta(l, a± + a 2 ) and S 2 i ~ Beta(l, ol\ +0^2), 
while if (H2) holds, then S\\ ~ Beta(l, a\ + a 2 ) and S 2 \ = Vq ~ Beta(l, a±). Hence, we have the 
following result. 



Proposition 4. Under (|10[) . if (HI) holds true, G\ and G 2 are (marginally) Dirichlet processes 
with the same precision parameter a.\ +a 2 and base measures Gqi, Gq 2 respectively. If (H2) holds 
true, then the first component of {G\, G 2 ) is a Dirichlet process with precision parameter a\ + a 2 
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Correlation under HI 




Correlation under H2 




0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 

a 2 a 2 



FIGURE 1. Left column: correlation between Six and S21 under (HI) (first row) 
and (H2) (second row). Right column: correlation between G\ and G2, assuming 
0, under (HI) (first row) and (H2) (second row) 

and base measure Gqi and the second component is a Dirichlet process with precision parameter 
ax & n d base measure Go2- 

Since with this construction the GVs are Dirichlet processes, we call (Gi,G2) Beta-Product 
Dependent Dirichlet Process of parameters a = (0.1,0.2) and base measure Go, in short /3, — 
DDP(a, Go), where i = 1 for (HI) and i = 2 for (H2). In addition, when we assume (J7|), we denote 
the resulting process with — DDP(a, Ho). 

It should be noted that the two processes have different marginal behaviors. The j3i— DDP(a, Go) 
process has marginals with the same precision parameter and should be used as a prior when the 
clustering along the different vector dimension is expected to be similar. In the $2 — DDP(a, Go) 
process, the precision parameter decreases along the vector dimension. This process should be 
used as a prior when a priori one suspects that the clustering features are different in the subgroups 
of observations. 
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For parameter elicitation purposes, it is useful to analyze how the choice of {a\,a 2 ) affects 
the correlation between G\ and G 2 . Le t us start by conside r ing th e correlation between the 



stick variables. From Theorem 4 and 6 in 



Nadaraiah and Kota (120051 ) . one obtains the following 



correlation between the components S\h and S 2 h in the cases of (HI) and (H2) 



Cor(Su,S 21 ) 



, z \ for (HI) 

ai (2+ ai +a 2 ) f , m , 
(ai+a 2 )(2+Qi) \ ' 



Fig. [T] shows the correlation level between the stick-breaking components (left column) and 
the random measures (right column) for different values of ol\ and a 2 . In these graphs, the white 
color is used for correlation values equal to one and the black is used for a correlation value equal 
to zero. The gray areas represent correlation values in the (0, 1) interval under both (HI) and 
(H2) beta models. According to the graph at the top-left of Fig. [TJ one can conclude that the 
parametrization used in this paper allows for covering the whole range of possible correlation 
values in the (0,1) interval. For instance, under (HI), a low correlation between the components 
of the stick-breaking corresponds to low values of the parameter a±, say between and 0.1, for 
any choice of the parameter a 2 - 

Corollary 5. Under the same assumptions of Proposition [4j 



Ci. 



2 



(ai+a 2 + l)(ai+2) \ f ™ 

2(ai + l)(ai+a2 + l) — (ai+2) J J \ J 

(a.+^ff^+D-a, + + a 2 + 1) for (H2). 



Recall that if holds true, then for any measurable set A: 

Cor(G 1 (A),G 2 (A))=C h2 . 

The graphs at the bottom of Fig. [T] show how the parameters a\ and a 2 affect the correlation 
between the components S±h and £2/1 of the bivariate beta used in the stick breaking process and 
the correlation between the random measures G\ and G 2 - when assuming . 

It is worth noticing that, under (HI), (S11, 5 2 i) converges (in distribution) to (Vi, V 2 ) as a 2 — > 0, 
where Vi and V 2 independent random variables with distribution Beta(l, oti). While, under (H2), 
(S11, 6*21) converges to (Vb, Vq) as a 2 — > 0, where Vo is a Beta(l, a\) random variable. In particular, 
if one assumes (J7J) and (H2), when a 2 — > 0, one gets the limit situation in which all the observations 
are sampled from a common mixture of Dirichlet processes. In other words, in this limit case, 
as can be seen by p5[) for G\ = G 2 , one considers the observations (globally) exchangeable, so 
no distinction between the two blocks are allowed. The other limiting case is when one assumes 
(HI) and takes (<^ifc,<^2fe) to be independent random elements with probability distribution G01 
and G*o2- In the limit for a 2 — > 0, one obtains two independent Dirichlet processes G\ and G 2 
with base measures Gqi and Gq 2 . In other words, with this choice, one considers the blocks of 
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observations as two independent blocks of exchangeable variables and no sharing of information 
is allowed. 

The (HI) construction for the case r > 2 follows immediately from (|16|) with a^k = 1, a%k = cti, 
&2k = ot2, that is by assuming Vok, Vik, Vik, ■ ■ ■ , Vrk to be independent with Vbfc ~ -Beta(l+ai, CK2) 
and Vjfc ~ Beta(l, oil), i = 1, 2, . . . , r, where a\ > and «2 > 0. In this case, Sik has Beta(l, a\ + 
c^) distribution for i = 1, . . . ,r and hence Gi is marginally DP(a\ + a 2 , Go;)- Also the formula 
for the correlation between two measures easily extends to the case r > 2 under ([7]): 

V V 7 jV " 2(ai + l)(oi +a- 2 + 1) - (ai + 2) 
The (H2) construction extends to r > 2 by setting in (|17[) aofc = 1 and a.ik = ctj (i > 1), 
that is by taking Vbfc, . . . , V r —ik to be independent random variables with Vok ~ Beto(l, ai), 
Vik ~ Beta(l + ai,a 2 ), V r -ik ~ Beta{\ + ai + ■ ■ ■ + a r -i,a r ), where a; > for all 
i = 1, . . . , r. In this last case Sik ~ Beta(l, a\ + • • • + a r _i+i) for every i = 1, . . . , r. Hence G; is 
marginally DP(ct\ + ■ ■ ■ + a r -i + i, Go;). Under ||7J) the correlation between Gj(.A) and Gj(-A) with 
1 < z < j < r is given by 

Corr(Gi (A), Gj(A)) 

^ 2^(1 + ai + ■ ■ ■ + a r -»+iT(l + «! + ■■■ + a r -j+i)i 

2(1 + a± H h a r _ i+ i) 2 + (2 + ai H h ar-j+i)(a r _ J+ 2 H h a-r-i+i) ' 

The proof of this last result is given in the Appendix. 

3.2. Poisson-Dirichlet process marginal. Recall that a Poisson-Dirichlet process, PD(a, I, Hq). 
with parameters < I < 1 and a > — I, and base measure Hq, is obtained by taking in (Jl])-© a 
sequence of independent random variables Sk with Sk ~ Beta(l — I, a + Zfc). In this section we 
show that by a suitable choice of the parameters in p^)) - p7)) we obtain a vectors of dependent 
random measures with Poisson-Dirichlet marginals. 

In the first case use (|16j) with aok = 1 — 1, a ik = Oil and «2fc = ot-2 + Ik, that is take Vbfc, . . . , V r k 
to be independent random variables such that 

(19) Vbfc ~ Beta(l — I + a.\, d2 + Ik), Vik ~ Beta(l — l,ai) i = l,...,r 

where a\ > 0, 0:2 > 0, and < I < 1. Proposition [3] yields that 5;fc ~ Beta(l — Z, ai + 0L2 + Ik). 
In the second case use (|17|) with aofc = 1 — i, aifc = ai + Zfc, and a;fc = a; for i > 2, that is take 
Vbfc, . . • , V r -ik to be independent random variables such that 

Vbfc ~ Beta(l - I, at + lk),V lk ~ Beta(l + «i + Z(fe - 1), a 2 ), 

(20) 

. . . V r -ik ~ Beta(l + «! + ••• + a r -i + Z(fc — 1), a r ) 

with aj > i = 0, . . . , r and < I < 1. In this last case Sjfc has Beta{\ — I, ai -\ — • + a r _;+i + Ik) 
for every i = 1, . . . ,r. 

Summarizing we have proved the following 



12 FEDERICO BASSETTI, ROBERTO CASARIN, AND FABRIZIO LEISEN 

Proposition 6. If (fTB|) and (fT9"]) hold true Gi is a PD(a\ + ct2,l,Goi) for every i = l,...,r, 
while if (|17|) and (|20p /io/c? true G,; is a PD{a\ + • • • + a r _i+i, /, Goi) /or every i = 1, . . . , r. 

4. Slice Sampling Algorithm for Posterior Simulation 
For pos t erior co mputation, we pro pose an extension of the slice sampling algorithm introduced 



in IWalkerl ([20071 ) ; iKalli et al.l ([20111 ) . For the sake of simplicity we shall describe the sampling 
strategy for a vector of Beta-Product DDP with r — 2 (ft— DDP(a, Go)), see Subsection [33] The 
proposed algorithm can be easily extend to the case r > 2 and to the Beta-Product dependent 
Poisson-DP. 

Recall that in ft — DDP(a, Go) the stick variables are defined by 

(Si k ,S 2k ) ■■= {V ok V lk ,V ok V 2k ) 

for a sequence of independent vectors V k ■= (Vofc, V\ k , V2fc) with the same distribution of (Vb, V\, V 2 ) 
and the convention V 2k = 1 and V k := (Vo, k , Vi.fc) under (H2). 

Starting from ([6]) , the key idea of the slice sampling is to find a finite number of variables to be 
sampled. First we introduce a latent variable u in such a way that fi(y) is the marginal density of 

fi(v,u) = J2 J iu< W ik } KiblPik). 

k>l 

It is clear that given u, the number of components is finite. In addition we introduce a further latent 
variable d which indicates which of these finite number of components provides the observation, 
that is 

(21) fi(y,u,d) := I{u < Wi, d }ld(x\tpd)- 

Hence, the likelihood function for the augmented variables (y, u, d) is available as a simple product 
of terms and crucially d is finite. 

To be more precise we introduce the allocation variables Dij (i = 1, 2; j = 1, . . . , rij) taking 
values in N and the slice variables Uij (i = 1,2; j = 1, ...,n,) taking values on [0,1]. We shall use 
the notation 

y/ ni) : =(y a ,...,y ini ), ~(D a ,...,D ini ), := (U iU ...,U ini ) 

and we write: Cp for (<p k ) k , V for (V k ) k , U for [u[ ni) ,U^\ for [D[ ni \ D^} and Y for 

[Y 1 (ni \Y 2 {n2) }. 

The random elements (cp, V) have the law already described. While conditionally on (jp,V), 
the random vectors (Yy, Uij, -Dy), i = 1,2; j = 1, . . . , nj, are stochastically independent with the 
joint density f2Tj) . 

We conclude by observing that it can be useful to put a prior distribution even on the hypcrpa- 
rameters (ai, 02). The law of dp is assumed independent of the random vector of hyperparameters 
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a := (<5i,Q!2), while the distribution of Vj depends on a through (HI) or (H2), so we shall write 

p{v, edv 3 \a}. 

Our target is the exploration of the posterior distribution (given Y) of [<p, V, U, a, D] by Markov 
Chain Monte Carlo sampling. 

Essentially we shall use a block Gibbs sampler which iteratively simulates (p given [V, U, D, a, Y], 
[V, U, a] given [D, (p, Y] and D given \V,U,(p,a,Y 

For the one dimensional, this blocking structure case has been introduced in 



of 



(2008) and 



Kalli et al 



Papaspiliopoulos 



(|201l[) as an alternative and more efficient version of the original algorithm 



Kalli et al 



ponh 



Walker! (|2007f ). Our algorithm extends the one dimensional slice sampling of 
to the multidimensional case. This extension is not trivial as it involves generation of random 
samples from vectors of allocation and slice variables of a multivariate stick-breaking process. We 
present an efficien t Gibbs sampling algorithm by elaborating further on the blocking strategy of 



Kalli et al 



( 2011 ) 



In order to describe in details the full-conditionals of the above sketched block Gibbs sampler, 
we need some more notation. Define for i = 1, 2 and k > 1, 



V, 



{j € {l,...,m} : Dij = k}, 



Ai, k := $^I{Aj = k} = card(Di, h 

3=1 



Bi, k : {!>,., /••!• 

3=1 



Moreover, let 
(22) 



D* 



max max _D, 



i=l,2 1<j< ni 

In our MCMC algorithm we shall treat V as three blocks of random length: V = (V*, V**, V***), 
where 



V* = {V k : k E V*}, 



\ 



{V k :k?V*,k<D*}, 



\ 



{V k :k>D*} 



and V* = {k : U T> 2 ,k ^ 0}- Note tnat D * = max{fc G V*} and |D*| < D* almost surely. In 
the following subsections we give the details of the full conditionals of the blocking Gibbs sampler, 
further details on the algorithm are given in Appendix. 

4.1. The full conditional of (p. The atoms (p given [V, D, U, a, Y] are conditionally independent 
and the full conditionals are: 



(23) 



P{<p k G d<p k \D,U,V,a,Y} = P{ip k g dtp k \D,Y} 

ocG (d(p k ) K,i{Yi,j\tpi k ) IC 2 (Y 2 j\(p2k); 



where <p k = (<^ifc, <f2k)- The strategy for sampling from this full conditional depends on the specific 
form of Ki and Go- In the next section we will discuss a possible strategy for Gaussian kernels. 
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4.2. The full conditional of [V,U, a]. In order to sample from the conditional distribution of 
[V, U, a] given [D, if, Y] a further blocking is used: 

• [V™,a] given [D,(p,Y]. The joint conditional distribution of [T^*,oi] given [D,<p,Y] is 

P{V* G dv*,a G (da 1 ,da 2 )\Y,ip,D} = P{V* G dv*,&e {da l ,da 2 )\D} 

( 24 ) tt Qk{v k \D,a 1 ,a 2 ) A ia a \ 

oc I — — — -dv k ir(da 1 ,da 2 ) 

where ir(dai, da 2 ) = P{a G (dai,da 2 )} is the prior on the concentration parameters and 

(1 - vokT 2 - 1 J] v%»(l - v^r^il - v ok v lk ) B *» 



(„ ,ai+A lk +A 2kl 
u 0k 



(25) Q k (v k \D,a) := { 



under (HI) with v k = {v nk ,v lk ,v 2k ) 
4 k lk+Mk {l ~ v ok ) aM *- l v^ +A ^{l - v lk )<**-\l - v ok v lk ) B ^ 
under (H2) with v k = (v ok ,v lk ) 

To sample from (|24|) . we iterate a two-step Metropolis-Hastings (M.-H.) within Gibbs with 
full conditionals 

(26) P{V* edv*\a,D}cx J] Q k (v k \D,a)dv k 

kev* 

and 

P{&e(d ai ,da 2 )\V*,D}K 11 ] v ^ {1 - Vok) ^-l 



(27) 



i(l,ai) 

For the each element (Vo kl V\ k , V 2k ) of V* we consider a multivariate Gaussian random 
walk proposal with diagonal scale matrix t 2 I^, with r 2 = 0.05 in order to have acceptance 
rates between 0.3 and 0.5 for the elements of V*. 

• [V**,V***] given [D, V* , (p, a, Y], The V k (with k V*) are conditionally independent 
given [D,V*,<f,a,Y] with P{V k G dv k \a, D, V*} cx <2fc(u fe |A a)dv k \ik < D* and P{Vfc G 
dv\V*,Cp,D,6t,Y} = P{V k G if fc > D*. Note that if k <£ V* and fc < D* , then 
^4i,fc = in the definition of Q k (v k \D,a). In order to sample from Q k {v k \D,a) the same 
M.-H. step, used for the full conditional in ((26)) . is employed. 

• U given [V, D, <f, a, Y] . The slice variables U are conditionally independent given [V, D, if, a, Y] 
with 

Mu < Wi D i 

(28) P{Ui j G du\V, Y, <p, D} = P{Ui j G du\V, D} = ~ 3 du. 
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4.3. The full conditional of D. The D's are conditionally independent given [V, U, <p, a, Y] with 

(29) P{D itj = d\<p,V,U,a,Y} cx JC,{Y^ ld ) l{U h3 < W hd }. 



Here an importa nt remark is in order. As in the slice sampling proposed in IWalkerl <|2007f ): 



Kallietal 



(|201lh . the full conditional (|29|) samples, almost surely, from a finite n umber o f terms . 
So again it is easy to sample from this full conditional. More precisely, following IWalkerl (|2007l ). 
we note that d > N*j ensures that Wi,d < Uij where N*j (i = 1,2; j = 1, . . . , n{) is the smallest 
integer such that 

(30) 



NT 



X 11 <> 1 ',.r 



k=l 

5. Illustrations 

We apply our new Beta-product dependent Dirichlct process to make inference for mixture 
of normals and mixture of vector autoregressive processes. The resulting model and inference 
procedure have been applied on both simulated data and real data on the industrial production 
in the United States and the European Union. 

5.1. /3i — DDP(a, Ho) mixtures of Gaussian distributions. In this section, we apply our 
/3i — DDP(a, Hq) Gaussian mixture model for inference on synthetic data generated from finite 
Gaussian mixtures. More precisely we assume ([8j)-(fT0]). (HI) and Gaussian kernels K.i(y\<p) (i = 
1, 2) with means \i and variance a 1 for ip = (/x, a 2 ), i.e. 

1 



2na 

As base measure Ho(d(p) we take the product of a normal A/"(0, s -2 ) and inverse gamma IQ(\, A), 
which are conjugate distributions for the bivariatc kernel at hand. For the vector a = (0.1,0.2) of 
the precision parameters of the bivariatc Dirichlct process we consider independent gamma priors 
S(Gi) C2i)S(Ci2j (,22)- In summary the Bayesian non-parametric model is 



Y ij I (Aij . a 2 a ) l ~ M(tkj . at ) 



1.2,J > 1 



(P>ij,Vij)\Gl,G2 



Gi i = l,2 



(Gi,G 2 )|a~/3 1 -DDP(a,lfo) 

a ~ ^(Cl2,C22)^/(Cl2,C22) 

The sampling procedure for U and D given in the previous section applies straightforwardly to 
this example. We shall describe here in more details the sampling strategy for the other unknown 
quantities of the model. For the sake of simplicity we will omit indicating the dependence of the 
full conditional on the hyperparameters. 
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In order to sample from the full-conditional P{(p k G dipk\V, D, Y^ n \ U}, for k > 1 wc consider 
a two-step Gibbs sampler with full conditional distributions 

P{fL k ^d^ k \d-\D,Y^}^ 

^ J m=l,2ieD m fc " k > 



and 



(32) 



P{af&daf\ijLk,D,Y^}<x 

cxexp{-Aa- 2 }K7 2 )^ J] II KT 2 ) 1 / 2 exp { -^(F mj - 

« ex P {-(A+ ±fog> + t©K" 2 } (O^*^.^)- 1 ^ 8 



which are proportional to the density function of a normal 

^ 2 (vii+& i 



s 2 + a fc 2 (A 1 , k + A 2 . k ) ' s 2 + a fe 2 (Ai, fc + A 2tk ) 



(33) TV 
and an inverse gamma 

(34) XG (x + \(A hk + A 2 , k ), A + + t$)\ 
respectively, where A m ^, m = 1,2 have been defined in the previous section and 

(35) V% k = E Y ™ and A= E (^™-Mfc) 2 - 

A sample from the conditional joint distribution of the precision parameters and the stick 
breaking elements can be obtained following the blocking scheme described in Subsection 14.21 
Since wc assume gamma priors, £/(Cii; C21) & n d 5(Ci2 ) C22) for ct\ and a 2 respectively, (|27[) becomes 

P{a G (da u da 2 )\V*,D} cx 

(36) 1 1 

Wi + w di w, ^ Q ' 11 " 1 exp { ~ ai ^ 2l} 412-1 exp { ~ a2 ^ 22} daida2 

where C21 = C21 -Efcei5( lo g( 1/ bfc)+l g( 1 -^ifc)+log(l- V 2k )) and C22 = C22-Efcec fogC 1- ^) and 
d\ = card(T>*) and d 2 = 2d\. We simulate from the full conditional by a M.-H. step. Wc considered 
two alternative proposals. First we assume independent proposals. At the j-th iteration, given 
a"" 1 ', we simulate 



(37) a« ~Ga(Cn,C 2 i), 4° ~ S«(Ci2, C: 



22 1 



and accept with probability 
(38) min 



' B(ai* ) +l,4* ) )*B(l,aJ* ) )* 
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In our experiments this kind of proposal turns out to be highly inefficient and the M.-H. exhibits 
low acceptance rates, thus we consider a gamma random walk proposal. At the j-th iteration of 
the algorithm, given the previous value (a'f , ) of the chain, we simulate 

(39) a W „ g a {K{o^- 1] ) 2 , na^), a { * ] ~ Qai^a^) 2 , mf 1 ') 

where k represents the scale of random walk. The proposal is accepted with probability 
( P{(a<i\aP)\V*,D} giat^l^Qiat 1 ^)] 



(40) min { 1, 



[at 1 \at 1 \v*,D}g(a[* ) \at 1) )9(4*Vt 1) 

is the conditional density of the gamma random walk proposals. We set the scale parameter k in 
order to have acceptance rates close to 0.5. 

We simulate n = 50 independent vectors, (Yi i j,Y2 ! j) with j = 1, ...,n, of observations. The 
component of the vectors (Yi,j, ^2j) are independent and alternatively follow one of these models. 

• The same three-component mixture of normals (model Mixl) 

IjV(-l0,l) + ijV(0,l) + ijV(l0,i) 
IjV(-10,l) + ^(0,1) + ^(10,1) 

• Two different mixtures with two common components (model Mix2) 

j/V(0, 0.5) + ijV(3, 0.25) + - A M{2, 0.25) + j/V(5, 0.5) 
i/V(0, 0.5) + i/V(3, 0.25) + jAf (-3, 0.25) + i/V(7, 0.5) 

• The same three-component mixture of normals with different component probabilities 
(model Mix3) 

i^(-10,l) + ^(0,l) + |Af(10,l) 

+ ^(0,1) + i/V(10,l) 
b o 6 

The simulated set of data considered in the experiments are given in Fig. [2] In the left column is 

the histogram of the first component of the set of data and in the right column is the histogram 

of the second component. Then we estimate the — DDP(o, Hq) mixture model on the different 

set of data. In the inference exercise, we choose a non-informative prior specification for the 

mean a n d pre cision parameters of the base measure and set s 2 = 0.1, A = 0.5 (see for example 



where 



Walked (12007)). For the co ncentration parameters (011,0:2) of the stick-breaking components, 



we follow 



Kalhetal 



(|2011l ) and consider two alternative specifications of the priors: weakly 
informative (WI) prior and strongly informative (SI) prior. For the WI case, the hyperparameters 
setting is (£ij = 0.01, C2j = 0.01), for j = 1,2, in all the Mixl, Mix2 and Mix3 experiments. 
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Figure 2. Sample of data for the first (left) and second (right) component gen- 
erated under different model assumptions: Mixl (first row), Mix2 (second row), 
and Mix3 (third row). 



This setting corresponds to diffuse priors on the concentration parameters, with prior means 
E((5i) = E(a2) = 1 and variances V(«i) = Y(6i2) = 100, and to a low prior dependence level 
(Cor(Gi, G2) = 0.6902) between the random marginal densities. In our WI setup, a small amount 
of information exchange is allowed a priori, between the two marginal densities and the posterior 
level of information exchanged is heavily affected by the empirical evidence. For the SI case, a 
large amount of information exchange is desired instead between the two sets of data. In the SI, 
we set (C11 = 100, C21 = 400) and (C12 = 100, C22 = 200) in the Mixl and Mix3 examples and 
(£11 = 10, C21 = 100) and (C12 = 200, £22 = 100) in the Mix2 example. These settings correspond 
to a very concentrated prior and a high prior dependence level between the two marginal densities 
Cor{G l ,G 2 ) = 0.7609 and Cor(G 1 ,G 2 ) = 0.9343 respectively. 

For both the WI and SI settings, the Gibbs sampler, presented in the previous section, was run 
for 20,000 iterations. The raw output of the MCMC chain for the number of clusters is given in Fig. 
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[3J For the estimation of the number of clusters, a burn-in period of 10,000 samples was discarded. 
At each Gibbs iteration from 10,000 onwards, a sample ^2,n+i) from the predictive was 

taken. The solid lines in Figure [4] show the estimated predictive distributions using 10,000 samples 
from the Gibbs and the original sets of data. 

Fig. [5] shows the raw output and the ergodic average of the MCMC chain for the parameters 
a\ and OLi in the WI and SI prior settings. 



5.2. /3]— DDP(a, Ho) mixtures of vector autoregressive processes. In parametric models 
for the growth rate of the industrial production (business cycle), great advances have been made 
by allowing for separ ate parame t er va lues in periods (called regimes) of recession and expansion. 
The seminal paper of iHamiltonl (|1989f ) proposes to use a dynamic mixture model with two com- 
ponents for capturing clustering of observations during the recession and expansion phases in a 
business cycle. This simple model has been successfully extended in many directions. In particu- 
lar, the estimation of the number of regimes is an i mportan t issue studied in many papers (e.g., 



Kim and Murravl (120021 ) . iKim and Pigerl (|2000h and iKrolzisJ (|2000r i). The estimation of the num- 



ber of regimes is still an open issue in the analysis of the business cycle. Moreover, specifically it 
is interesting to verify whether the strong contraction in 2009 calls for the use of a higher number 
of regime than three or four in business cycle models. 

The above cited papers consider parametric models with a regime-switching mechanism and 
use some model selection criteria to estimate the number of regimes. Conversely, in this paper, we 
propose a non-parametric approach to the joint estimation of the number of regimes in multiple 
time series. We assume our Dirichlet mixture process /3i— DDP(a, Hq) as a prior for the parameters 
of a vector autoregressive model (VAR) for time series data. We consider two well studied cycles of 
the international economic system: the United States (US) and the European Union (EU) cycles. 
Even if the features of the regimes (or clusters) in the US and the EU growth rates are different, 
one could expect that the regimes in the two cycles also exhibit some dependence. For this reason, 
we apply a dependent multivariate Dirichlet process to account for the similarity between the 
clustering of the two series. In this sense, our model extends the existing literature on the use 
of univariate Dirichlet process prior for time series models. In this literature, the same clustering 
process is usually assumed for all the parameters of a multivariate model. 

We consider seasonally and working day adjusted industrial production indexes (IPI), at a 
monthly frequency from the time of April 1971 to January 2011, for the US and the EU, {X\ t }J =l 
and {X2t}f = i respectively (see first row in Fig. [6]). We take the quarterly growth rate: Yu = 
log — log Xn-4 (second row in Fig. |6]). The histograms of these time series (see histograms 
Fig. El) exhibit many modes that are the results of different regimes in the series. We consider the 
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Figure 3. Number of clusters for different prior settings (panels WI and SI), for 
the first (left column) and second component (right column) and for the different 
models: Mixl (first row), Mix2 (second row) and Mix3 (third row). 
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Figure 4. Approximated predictive density (solid lines) for the two settings 
(panels WI and SI) and for the first (left) and second (right) component of the 
data from models Mixl (first row), Mix2 (second row), and Mix3 (third row). 
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FIGURE 5. Output of the M.-H. within Gibbs for the two settings (panels WI and 
SI) for the parameters ot\ and a 2 of the models Mixl (first row), Mix2 (second 
row), and Mix3 (third row). 



following specification for the VAR model 



Hit 

M2t 



ZL OL 



o' 2p 



z[ 



Ti 
T 2 



for t = 1, . . . ,T, where 2p = (0, . . . , 0)' e R 2p , = (v hh 



,Vi,2 P ,i)' and Z t = (Y U - 



■ -,Yi 



t—pi 



Yit-i, ■ ■ ■ , Yzt-p)' and £u ~ A/"(0, af t ) with £i t and e 2s independent Vs, t. 



Hamilton! ( 1989 ) and 



Krolzig 



In this paper we consider four lags (i.e. p = 4) as for example in 
200(1). Moreo ver, as most of the forecast errors are due to shifts to the deterministic factors (see 



Krolzid (|2000l ) and lClements and Krolzid (|1998h ) . we propose a model with shifts in the intercept 



and in the volatility and assume a vector of Dirichlet processes as a prior for (fMu, c| t ), i = 1,2 



(41) 



(Ait! <Jit)\Gi,G 2 > ~"" G\ 
(M2*i 5ft) l^i, G 2 G 2 
(Gi J Gf 2 )~ft-DDP(o ) fl , ) 

Q ~ 0(Cll,C2l)^(Cl2,C22) 
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where the base measure H is a product of normal A/"(0, s -2 ^) & n d inverse gamma IQ(\, A). 

Following the standard practice in Bayesian VAR modelling, for the parameters Ti and T2 
we consider improper uniform prior on M. ip and obtain a multivariate normal as full conditional 
posterior distribution to be used in the Gibbs sampler. The charts in the first row of Fig. [7] show 
the predictive distributions (solid lines) generated by the non-parametric approach conditioning 
on all values of yu, for i = 1, 2 and t = 1, . . . , T and the best normal fits (dashed lines) for the 
empirical distributions of the two series. 

From a comparison with the empirical distribution, we note that the non-parametric approach, 
as opposed to the normal model, is able to capture asymmetry, excess of kurtosis, and multimodal- 
ity in the data. The results from our non-parametric approach are in line with the practice of 
using of time- varying parameter models (e.g., Markov-switching models) to capture asymmetry 
and non-linearity in both the US and the EU business cycles. 

The posterior distribution of the number of clusters is given in the second row of Fig. [7] The 
location of the posterior mode of the histograms allows us to conclude that the non-parametric 
approach detects three clusters for the US cycle and four clusters for the EU cycle. The result for 
the US data is coherent with the results availab le in the literature where three-regime Markov- 
switching models (see for example iKrolzia (|2000|)) are usually considered. Moreover, we observe 
that the inclusion in the sample of the 2009 negative-growth (recession) period extends the validity 
of many past empirical findings that do not include the 2009 slowdown in the economic activity. 
An inspection of the posterior mean of the atoms and of the marginal clustering (see below in 
this section) allows us to conclude that the three clusters can have the economic interpretation of 
business cycle phases associated to substantially different levels of IPI growth-rates. 

The results for the US and the EU cycles are, in a certain way, coherent with the output 
of parametric studies which suggest to consider at least three regimes. Nevertheless, the effects 
of the 2009 recession on the past empirical findings is an open issue and a matter of research. 
The result from our non-parametric approach is an interesting one because it suggests that four 
components are needed in order to capture the effects of the 2009 recession phase (see Fig. [7|) . As 
a consequence of the 2009 recession, a long left tail present in the predictive (solid line in Fig. [7]) 
is fatter than the tail of the best normal (dashed line in the same figure). 

Fig. [5] shows the sequence of predictive densities (gray area) indexed by time t, for t = 1, . . . , T. 
The predictive density for yu has been estimated conditionally on the whole set of data and has 
been evaluated sequentially over time at the current values of the predictors yu-i, ■ ■ ■ ,yu-p, for 
i = 1, 2. In this figure, the effects of the recession are evident from the presence of non-negligible 
probability values in correspondence of extremely negative growth rates that were not realized 
before 2009. Similarly, we found that, in both the expansion and recession phases, posterior 
distribution of the atoms exhibit multimodality and asymmetry. As an example Fig. [9j shows the 
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Figure 6. First row: Industrial Production Index (IPI) for the US (xi t ) and 
the EU (x2t) at monthly frequency for the period: March 1971 to January 2011. 
Second row: logarithmic quarterly changes in the US IPI (yit) and the EU IPI 
(j/2t) variables. 
0.4 : 




FIGURE 7. First row: IPI log-changes (histogram), predictive distribution (solid 
line), best normal (dashed line). Second row: number of clusters. 
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Figure 8. US and EU IPI growth rates (black lines) and predictive densities 
(gray areas) evaluated sequentially for t = 1, . . . , T, at the values of the predictors 
y it -i,...,yit-p, for i = l,2. 



approximated posterior of the atoms fin and an in periods of expansion (t = 430) and recession 
(t = 450). The posterior distribution of fin exhibits two modes in the positive half of the real 
line during an expansion phase and two modes in the negative half during a recession phase (first 
column of Fig. [9]). From the second column of the same figure, one can conclude that the volatility 
posterior distribution for both the US and the EU is more concentrated around lower values in 
expansion periods. 

In order to identify the different components of our DP mixture model, we compute the posterior 
clustering of the data and the associated values of the atoms for e ach observa tions and country. 
We apply the least square clustering method proposed orig inally in Dahl J 20061 ) . The method has 



Kim et al 



2006) and 



Rodriguez et al 



been s uccessfully used in many applications (see for example 
(|2008l )) and is based on the posterior pairwise probabilities of joint classification P{Di S — Dj t \Y} 
To estimate this matrix, one can use the following pairwise probability matrix: 

M 



M 



that is estimated by using every pair of allocat ion v a riable D\ s D l it , with s,t 



over all the I = I, . . . , M MCMC iterations. In 



1,...,T and 



Dahll (|2006h 's algorithm, one needs to evaluate 
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Figure 9. Posterior density approximations for the atoms of the US ((//it, cit)) 
and the EU ((^2t, 02*)) growth rates yu and y<it during expansion (t = 430) and 
recession period (t = 450). 

for i = j and i = 1,2. The least square marginal clustering Di t LS is the clustering D\ 



(Dl\, . . . , Djj,) (see Fig. HO)) sampled at the Zj-th iteration which minimizes the sum of squared 
deviations from the pairwise posterior probability: 

T T 



u 



ar 



jgmin VVL {D\ t )-P u 



More specifically, the first row (second row) shows the posterior probabilities that two observa- 
tions of the US cycle (EU cycle) belong to the same cluster. In the first column, one can clearly 
detect the presence of vertical and horizontal dark gray bands. They correspond to observations 
that do not cluster frequently together with other observations and that are associated with neg- 
ative growth rates. A similar remark is true for the light gray areas. In the second column of Fig. 
ITTJl one can see the different behavior of the clustering for the US and the EU during the 2009 
crisis. 
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Figure 10. Pairwise posterior probabilities for the clustering of the US data 
Pn, st and the EU data i"22,st, and the common clustering between the US and 
EU data, P 12 , st for s, t G {1, . . . , T} 



Finally, the assumption in Eq. (0 implies that the set of atoms sampled at every MCMC iter- 
ation is the same for the t wo ser i es. Th is makes the allocation variables D\ t and D l 2s comparable. 
For this reason, we apply iDahll (|2006l) 's algorithm to study the posterior probability Pij tS t that 
two observations, each one from a different series (i.e. i ^ j), belong to the same cluster. That 
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Figure 11. Atom (jj,^ , a\ t ) associated to the LS marginal clustering U for i = 1,2. 

gives a measure of association between the two clustering, induced by the dependent DP, for the 
two series. The estimated pairwise probability is given in the third row of Fig. [TU]that shows the 
probability that two observations, one of the US cycle and another one of the EU cycle, belong 
to the same cluster. The white and light gray lines show that the two marginal clustering share 
some atoms. 

The least square clustering allows us to find the posterior clustering of the data and to identify 
the different clusters. For the US cycle, the observations cluster together in three groups (see 
Fig. rPTj) and the atoms associated with the three clusters are (/iit, ait) € {(—2.4142,1.3625), 
(0.136,0.6166), (0.6216, 1.0618)} and lead to the identification of the cluster as recession, normal 
expansion, and strong expansion phases. For the EU cycle, the observations are classified in four 
groups (sec Fig. Q3]) and the atoms are ((-10.6170,0.6600), (-1.2183,2.7857), (0.0760,0.5794), 
(0.4909, 1.2165)) and are interpreted as strong recession, normal recession, normal expansion, and 
strong expansion phases. These results on the features of the cycle phases are coherent with the 
recent findings in the business cycle literature with an exception for the EU cycle, which presents 
a fourth cluster of observations with very high negative growth rate (—10.6170) corresponding to 
the 2009 recession period. 

6. Conclusions 

We introduce a new class of multivariate dependent Dirichlet processes for modeling vectors of 
random measures. We discuss some properties of the process. Wc apply the dependent Dirichlet 
process to the context of non-parametric Bayesian inference and provide an efficient MCMC 
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algorithm for posterior computation. Since our process is particularly suitable for groups of data 
that exhibit a different clustering behavior, we apply it to multiple time series analysis. We provide 
an original application to the joint analysis of the US and the EU business cycles and show that 
our non-parametric Bayesian model is able able to highlight some important issues for this kind 
of data. 
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Appendix A. The algorithm 

In the Block Gibbs Sampler described in Section [4] in principle one needs to sample an infinite 
number of V k and (p k ■ But in order to proceed with the chain it suffices to sample a finite number 
of VfcS to check condition ([30]) and the finite number of (p k to be used in ([29j) . 

For the sake of clarity we summarize here the blocked Gibbs sampling algorithm. 



THE ALGORITHM 

• INITIAL STEP. Initialize U, D,a. With D compute D* by using 

(HI. 

• UPDATING STEP. Suppose to have a sample of all the variables 
involved in the algorithm. This variables that comes from the previ- 
ous step is labeled with " old" . The variables that will be generated 
in the next step are labeled with " new" . 

(1) The (V*, a)\new are sampled by using the D\old and (|24p with 
Metropolis within Gibbs step as described in Subsection 14.21 

(2) The V**\new are sampled by using the D\old and a\new by a 
Metropolis step as described in Subsection 14.21 

(3) The U\new are sampled by using the (V* , V** , a)\new and ([28]); 

(4) JV t *j are computed by using (|30j) . with U\new and Vj\new. If 
some Vk\new with k > D*\old are needed they are sampled 
from the prior P{Vk <E dvk\a}. 

(5) The 0j\new for j = 1,...,N*, with N* := 
max^i^ maxi< 3 < ni (JV*j), are sample by using (|23j) and 
D\old as described in ([23]) and in Section IQI 

(6) The D\new are sampled by using (|29[) with U\new, V^\new 
and ipi | new, . . . , (pN* \new. 



Appendix B. Proofs 
Proof of Proposition \12[ First of all observe that 



E[G 1 (A)G 2 (B)} = J2 niA^ik)lB^2h)WW lk W 2hl 

h>l,k>l 



(42) = J2 nu&ikWiiB&2h)}nw lk }E[w 2h } 

h>l,k>l,kjtk 



J2 E i I AxB(9lh,<f2h)MWi h W 2h } 



h>l 
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Now note that the following equalities hold 



(43) 



h>l 



SlhS2h (1 — Si m )(l — S 2 



m<h — 1 



E[5ii5 2 i] 



(44) y, e s ihS2k n o—si m ) n(i-s 2l 

h=£k m<h-l l<k-l 

Combining g2]) with (|33"]l -(|i4" |) it follows that 

E[G 1 (A)G 2 (B)} = G (A xBx X r ~ 2 ) 



= E[Sii]+E[S 2 i]-2E[giig 2 i] 
E[5ii]+E[5 2 i]-E[S n S 2 i] 

E[5u5 21 ] 



-G 01 (A)G 02 (B)- 



E[ 1 Si 1 ]+E[S , 2i]-E[5 U( S , a i] 
[S 11 }+E[S 21 ]-2E[S 11 S 21 ] 



E[5n]+E[Sai]-E[5nS 2 i] 
Since E[G,(-)] = G 0i (-), i = 1, . . . ,r, it follows 

E[5 U 5 21 ] 



(45) Coi;[Gi(A),G? 2 (B)] 



E[5n]+E[5 2 i]-E[5iiS 2 i] 



[G (A x B x X 1 '- 2 ) - G 01 (A)Go2(B)]. 



In a similar fashion 
(46) 

(47) 

and then 
(48) 

(49) 



E[Gi(A) 
E[G 2 (B) 



2l _ Goi(A)E[Sf 1 }+2G 2 01 (A)(nS 11 }-E[S 2 11 }) 



2E[5n] - E^] 



_ G 02 (B)E[Sl] + 2<G 2 2 (B)(E[S 21 ] - E[S^]) 



2E[Sai] - E^] 
Vor[Gi(A)]=Goi(A)(l-Goi(i4)) 



E[S 2 



nJ 



2E[5n] - E^] 



Far[G 2 (E)] = G 02 (S)(1 - G 02 (B)) 



E[S 2 21 ] 
2E[S 21 ] - E[S 2 21 ] 



□ 



Proof of Corollary [5[ By direct calculation or using the results in lNadaraiah and Kota (|2005h one 
obtains 



E(Sai) =E(Su) 

E{S 21 Sn) = 
for (HI) and 



-, E(5 2 ' 1 )=E(5 1 2 1 ) 



1 + a.\ + a 2 
B{2,a l )B{2 1 a 1 )B{a 2 ,a 1 +i) 

B(l,ai)B(l,ai)B(a 1 + l,a 2 ) ~ (o-i + \){a 1 +a 2 + l)(a 1 + a 2 + 2) 



(1 + ai + a 2 )(2 + ai + a 2 ) 
ai + 2 



E(5 U ) = 
E(S 2 ^ - 



1 + a± + a 2 
~ (l + ai+a 2 )(ai+a 2 + 2) 



-, E(Sai) = 
2 



1 



1 + ai 

E(S 2 21 ) = 



E(S 21 S n ) 



(l + ai)(2 + ai) 
2 



B(3,a 1 )B(2 + a 1 ,a 2 ) 

B{l,a 1 )B(a 1 + l,a 2 ) ~ (2 + ai)(l + a ± + a 2 ) 
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for (H2). Hence the correlation between the two random measures is 
Cor(G 1 (A),G 2 (A)) = 



E[5 2 i5ii] /(2E[S 2 i] - E[5 2 2 1 ])(2E[5ii] - E^]) 



1 - E[(l - S- 2 i)(l - Su)] V E[fi?i]E[<Sfi] 
(ai + a 2 + l)(ai +2) 
2(ai + l)(ai + a 2 + 1) - (a-i + 2) 



and 

Cor(Gi(A),C?a(i4)) 



E[5 2 iSn] (2E[5 21 ] - E[Sf 1 ])(2E[5 u ] - E[^]) 



1 - E[(l - S 21 )(l - S n )] V E[fi?i]E[<S?i] 
2(ax + l) 



(ai + 2)(2a 1 + a 2 + l)-ai 
for (HI) and (H2), respectively. 



a/ (qi + l)(ai + d2 + 1) 



□ 



Proof of (US]). For the sake of simplicity write Vfc in place of T4i. Recall that Vq ~ Beta(l,ai) 
and, for 1 < fc < r — 1, Vfc ~ Beta(l +«! + •••+ a^, Let 1 < z < j < r. Since 

5ii = V Vi . . . V r -i, one gets 

Si,iS jA = l,fl 2 . . . V^-K-i+i • ■ ■ V r -i- 

After some computations, using the fact that Vj are independent, 

E[Si iSj i] = % -. 

[2 + ai + ■■ ■ + a r _j-_|_ij(l + «! + ••• + a r -i+i) 

In addition, one has 

E[Si il = , E[S?i] = 7 J- r. 

l + c*i + --- + av-i+i MJ (l + ai + --- + a r _i+i)(2 + ai + "- + a r _ i+ i) 

Set 



r ^Pu^j.i] // m^j,!] w m^ij 

° lJ ' E[S M ] + E[5j, x] - E^xS,-, i] V V E[Sy J V E[5|J 

Simple algebra gives 



//2E[5, J \ /2E[5 7l ] \ r — 7 

V (wjj - x ) ( Ef^i - : ) = + + ■ ■ ■ + +«! + •••+ a P _,- +1 ) 



and 

E[S M %i] 



E[5i,i] + E[% x ] - E[£ M 5,,i] 

2(1 + qi H + a r _j+i) 



2(1 + ai H h a r - j+ i) 2 + (2 + ai H h ay- j+i)(ar-j+2 H h »r-!+i) 

That is 

c _ 2^(1 + oti j ha r -., + i)(l + ai j hay-j+i)^ 

lJ 2(1 + «! + ••• + qv-j+i) 2 + (2 + aH h a r - J+ i)(a,--j + 2 H + av-;+i) 

which gives ((T8j) since dj = corr(Gi(A), Gj(A)). □ 
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Full- conditionals. The joint distribution of [V, (p, U, D, Y, a] is 

P{V £dv,ip€ d(p,Y G dy,U"G du {n) ,D = d {n) ,a € (d ai ,da 2 )} 

Thi 

(5°) = [ II II < W ii*i,j}K'(yi,j\ t Pdi,j) ®»=1,2 ^"LldVijdUij 

i=l,2j=l 

®fc>i [-P{U fe G du fc |5 = (ai, a 2 )} (8 G a (d(p k ) ® P{a G (dari, da 2 )} 
where = VQ^Vik Yij<k(^ ~ v 0jVij), with the convenction that «2fc = 1, for every fc, under (i^)- 
Proof o/ p3 ]) -p5 ]l . From {H one gets 

P{V edv,&e (dax,da 2 )\Y (n) ,(p,D} 

Tli 

oc\ ~\w itDzj ®j>i P{Vj € dvj\a= (a 1 ,a 2 )}P{a € (dai,da 2 )}. 

i=l,2j=l 

Now note that 



zr 

i=l,2j=l j=l 



1 wi, Dij = n '•„• 



(l-^i,) By (l-«o^ 2j )^. 

□ 
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